Unsupervised Feature Selection for the $k$-means Clustering Problem

نویسندگان

  • Christos Boutsidis
  • Michael W. Mahoney
  • Petros Drineas
چکیده

We present a novel feature selection algorithm for the k-means clustering problem. Our algorithm is randomized and, assuming an accuracy parameter ε ∈ (0, 1), selects and appropriately rescales in an unsupervised manner Θ(k log(k/ε)/ε) features from a dataset of arbitrary dimensions. We prove that, if we run any γ-approximate k-means algorithm (γ ≥ 1) on the features selected using our method, we can find a (1+ (1+ ε)γ)-approximate partition with high probability.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A New Hybrid Feature Selection using Natural Language Processing for Text Clustering

Text clustering is unsupervised machine learning method.It needs representation of objects and similarity measure. which compares distribution of features between objects. For the high dimensionality of feature space performance of clustering algorithms decreases.Two techniques are used to deal with this problem: feature extraction and feature selection.In this paper, we describe the hybrid met...

متن کامل

A hybrid DEA-based K-means and invasive weed optimization for facility location problem

In this paper, instead of the classical approach to the multi-criteria location selection problem, a new approach was presented based on selecting a portfolio of locations. First, the indices affecting the selection of maintenance stations were collected. The K-means model was used for clustering the maintenance stations. The optimal number of clusters was calculated through the Silhou...

متن کامل

Research of Feature Selection for Text Clustering Based on Cloud Model

Text clustering belongs to the unsupervised machine learning, the discriminability of class attributes cannot be measured in clustering. And the traditional text feature selection methods cannot effectively solve the high-dimensional problem. To overcome the weakness in existing feature selection, this paper proposes a new method which introduces the cloud model theory into feature selection, c...

متن کامل

Hybrid Active Feature Selection For Text Classification

Clustering is the most common form of unsupervised learning.In clustering, it is the distribution and makeup of the data that will determine cluster membership. It needs representation of objects and similarity measure. which compares distribution of features between objects. For the high dimensionality, feature extraction and feature selection improves the performance of clustering algorithms....

متن کامل

Clustering with feature selection using alternating minimization, Application to computational biology

This paper deals with unsupervised clustering with feature selection. The problem is to estimate both labels and a sparse projection matrix of weights. To address this combinatorial non-convex problem maintaining a strict control on the sparsity of the matrix of weights, we propose an alternating minimization of the Frobenius norm criterion. We provide a new efficient algorithm named K-sparse w...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2009